Skip to content

Conversation

@cehongwang
Copy link
Collaborator

Description

Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.

Fixes # (issue)

Type of change

Please delete options that are not relevant and/or add your own.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Checklist:

  • My code follows the style guidelines of this project (You can use the linters)
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas and hacks
  • I have made corresponding changes to the documentation
  • I have added tests to verify my fix or my feature
  • New and existing unit tests pass locally with my changes
  • I have added the relevant labels to my PR in so that relevant reviewers are notified

@meta-cla meta-cla bot added the cla signed label Nov 4, 2025
@github-actions github-actions bot added component: tests Issues re: Tests component: lowering Issues re: The lowering / preprocessing passes component: conversion Issues re: Conversion stage component: api [Python] Issues re: Python API component: dynamo Issues relating to the `torch.compile` or `torch._dynamo.export` paths labels Nov 4, 2025
@github-actions github-actions bot requested a review from peri044 November 4, 2025 20:05
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some changes that do not conform to Python style guidelines:

--- /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/_compiler.py	2025-11-04 20:05:23.825034+00:00
+++ /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/_compiler.py	2025-11-04 20:05:55.253944+00:00
@@ -876,15 +876,14 @@
    # This is done to release CPU memory.
    for attr in dir(gm):
        if attr.startswith("_frozen_param"):
            delattr(gm, attr)

-
-
    from torch_tensorrt.dynamo.conversion._ConverterRegistry import DYNAMO_CONVERTERS
+
    DYNAMO_CONVERTERS.disallowed_targets = set()
-    
+
    for name, _ in partitioned_module.named_children():
        submodule = getattr(partitioned_module, name)
        # filter on the GraphModule
        if not isinstance(submodule, torch.fx.graph_module.GraphModule):
            continue

Copy link
Collaborator

@narendasan narendasan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a test case or something to demonstrate this feature?


logger = logging.getLogger(__name__)
NON_BREAKABLE_OP_LISTS = [
["addmm", "addmm"],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note for implementation later.

  1. this should use an actual subgraph definition
  2. it should use pytorch op targets not strings
  3. addmm should be decomposed right so the graph we want is mm -> add
  4. There should be a user facing API to modify this list similar to what we have for passes


def calculate_num_of_break(self, subgraphs: List[Subgraph]) -> int:

def calculate_size_budget(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be an API to define this manually?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think so. For now you can just hardcode and play with it

@cehongwang cehongwang force-pushed the cpu-memory-graph-break branch from 7f0e504 to 18ccadf Compare November 5, 2025 22:03
@narendasan
Copy link
Collaborator

  1. We should think about using this tech for refit vs non refit
  2. Make refit apis work across graph breaks

@narendasan
Copy link
Collaborator

Improve usability by automating nn.Module -> atomic fx graph

@cehongwang cehongwang force-pushed the cpu-memory-graph-break branch from 18ccadf to f03ab2c Compare November 6, 2025 20:06
return x


All_FUSION_PATTERNS = [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could cache the graphs if we do symbolic trace on register

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean to trace the graphs when the program starts every time? Do you think that would cause unnecessary latency when cpu memory is enough. I am thinking maybe we could use LRU cache or something so it will only be called once and it's lazy initialization

L2_LIMIT_FOR_TILING = -1
USE_DISTRIBUTED_MODE_TRACE = False
OFFLOAD_MODULE_TO_CPU = False
CPU_MEMORY_BUDGET = -1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use an optional instead since this is not a TRT api we dont need -1 to mean let us decide

return psutil.Process().memory_info().rss / 1024 / 1024


def release_memory() -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did this get moved?

torch._dynamo.reset()


def compile_one(idx: int, ir: str):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this test here?


def size_of_subgraphs(self, subgraphs: List[Subgraph]) -> List[int]:
"""
This function calculates the size of the subgraph.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you describe the algorithms here so we have reference for later?

@cehongwang cehongwang force-pushed the cpu-memory-graph-break branch from 2389d34 to 7f9373f Compare November 7, 2025 21:55
@cehongwang cehongwang force-pushed the cpu-memory-graph-break branch from 7f9373f to 9ee7e67 Compare November 7, 2025 23:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed component: api [Python] Issues re: Python API component: conversion Issues re: Conversion stage component: dynamo Issues relating to the `torch.compile` or `torch._dynamo.export` paths component: lowering Issues re: The lowering / preprocessing passes component: tests Issues re: Tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants